AITopics

2605.1426

Country: North America > Canada (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

arXiv.org Machine LearningMay-13-2026

TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing

Li, Yuanpeng, Lin, Gefei, Qu, Annie, Miao, Rui

Soft Actor-Critic (SAC) and its variants dominate Multi-Task Reinforcement Learning (MTRL) due to their off-policy sample efficiency, while on-policy methods such as Proximal Policy Optimization (PPO) remain underexplored. We diagnose that PPO in MTRL suffers from a previously overlooked issue: critic-side gradient ill-conditioning, which may cause tail tasks to stall while easy tasks dominate the value function's updates. To address this, we propose TOPPO (Tail-Optimized PPO), a reformulation of PPO via Critic Balancing -- a set of modules that improve gradient conditioning and balance learning dynamics across tasks. Unlike prior approaches that rely on modular architectures or large models, TOPPO targets the optimization bottleneck within PPO itself. Empirically, TOPPO achieves stronger mean and tail-task performance than published SAC-family and ARS-family baselines while using substantially fewer parameters and environment steps on Meta-World+ benchmark. Notably, TOPPO matches or surpasses strong SAC baselines early in training and maintains superior performance at full budget. Ablations confirm the effectiveness of each module in TOPPO and provide insights into their interactions. Our results demonstrate that, with proper optimization, on-policy methods can rival or exceed off-policy approaches in MTRL, challenging the prevailing reliance on SAC and highlighting critic-side gradient conditioning as the central bottleneck.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2605.11473

Genre: Research Report > New Finding (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Frazier, David T., Wang, Hui

Concentration and Calibration in Predictive Bayesian Inference

arXiv.org Machine LearningMay-4-2026

Predictive Bayesian inference (PBI) represents a model-and prior-agnostic approach to standard Bayesian inference which allows users to quantify uncertainty for a functional of interest only by specifying a forward predictive model for future unobserved data. The flexibility and generality of this framework have led to a host of novel algorithms for implementing this approach, and many empirical applications, yet the reliability of the resulting inferences for the underlying statistical functional of interest remains unclear. Herein, we demonstrate that when using PBI for a population functional of interest, the resulting posterior concentrates onto a well-defined quantity that explicitly depends on the forward predictive model used to implement the predictive recursion underlying the method. Furthermore, the forward predictive model entirely determines the uncertainty quantification produced in PBI. Consequently, our results show that if the predictive model does not capture all relevant features of the data, and, even in very simple examples, the coverage of predictive Bayes credible sets for the population value of the functional of interest can be arbitrarily close to zero. We carefully explain why this occurs, and show that this behavior is directly tied to the inaccuracy of the forward predictive model used to produce future observations within the PBI framework. As a consequence, our results imply that in order for PBI to deliver calibrated posterior inferences, the resulting predictive engine used to generate posterior samples must contain, in a well-defined sense, the true DGP, else inferences generated under this framework will not be calibrated.

artificial intelligence, bayesian inference, modeling & simulation, (15 more...)

2605.00455

Genre: Research Report > New Finding (0.74)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)

Neural Information Processing SystemsApr-30-2026, 19:48:37 GMT

018b59ce1fd616d874afad0f44ba338d-AuthorFeedback.pdf

artificial intelligence, log mn, machine learning, (18 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

arXiv.org Machine LearningApr-28-2026

The Optimal Sample Complexity of Multiclass and List Learning

Pabbaraju, Chirag

While the optimal sample complexity of binary classification in terms of the VC dimension is well-established, determining the optimal sample complexity of multiclass classification has remained open. The appropriate complexity parameter for multiclass classification is the DS dimension, and despite significant efforts, a gap of $\sqrt{\text{DS}}$ has persisted between the upper and lower bounds on sample complexity. Recent work by Hanneke et al. (2026) shows a novel algebraic characterization of multiclass hypothesis classes in terms of their DS dimension. Building up on this, we show that the maximum hypergraph density of any multiclass hypothesis class is upper-bounded by its DS dimension. This proves a longstanding conjecture of Daniely and Shalev-Shwartz (2014). As a consequence, we determine the optimal dependence of the sample complexity on the DS dimension for multiclass as well as list learning.

artificial intelligence, inductive learning, machine learning, (16 more...)

2604.24749

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.62)

Neural Information Processing SystemsApr-26-2026, 12:58:19 GMT

Sharp Analysis of Stochastic Optimization under Global Kurdyka-Łojasiewicz Inequality

We study the complexity of finding the global solution to stochastic nonconvex optimization when the objective function satisfies global Kurdyka-Łojasiewicz (KŁ) inequality and the queries from stochastic gradient oracles satisfy mild expected smoothness assumption. We first introduce a general framework to analyze Stochastic Gradient Descent (SGD) and its associated nonlinear dynamics under the setting. As a byproduct of our analysis, we obtain a sample complexity of O(ϵ (4 α)/α) for SGD when the objective satisfies the so called α-PŁ condition, where α is the degree of gradient domination. Furthermore, we show that a modified SGD with variance reduction and restarting (PAGER) achieves an improved sample complexity of O(ϵ 2/α)when the objective satisfies the average smoothness assumption. This leads to the first optimal algorithm for the important case of α = 1 which appears in applications such as policy optimization in reinforcement learning.

artificial intelligence, complexity, machine learning, (13 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)

Neural Information Processing SystemsApr-25-2026, 10:29:14 GMT

33d6548e48d4318ceb0e3916a79afc84-Supplemental.pdf

artificial intelligence, machine learning, probability, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Neural Information Processing SystemsApr-24-2026, 18:11:51 GMT

Quantum Speedups of Optimizing Approximately Convex Functions with Applications to Logarithmic Regret Stochastic Convex Bandits

We initiate the study of quantum algorithms for optimizing approximately convex functions. Given a convex set K Rn and a function F: Rn Rsuch that there exists a convex function f: K R satisfying supx K|F(x) f(x)| /n, our quantum algorithm finds an x K such that F(x) minx KF(x) using O(n3) quantum evaluation queries to F. This achieves a polynomial quantum speedup compared to the best-known classical algorithms. As an application, we give a quantum algorithm for zeroth-order stochastic convex bandits with O(n5 log2 T) regret, an exponential speedup in T compared to the classical Ω( T) lower bound. Technically, we achieve quantum speedup in nby exploiting a quantum framework of simulated annealing and adopting a quantum version of the hit-and-run walk. Our speedup in T for zeroth-order stochastic convex bandits is due to a quadratic quantum speedup in multiplicative error of mean estimation.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Hardware (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Neural Information Processing SystemsMar-23-2026, 06:19:11 GMT

A Constant-Factor Bi-Criteria Approximation Guarantee for k-means++

Dennis Wei

This paper studies the k-means++ algorithm for clustering as well as the class of D` sampling algorithms to which k-means++ belongs. It is shown that for any constant factor β > 1, selecting βk cluster centers by D` sampling yields a constant-factor approximation to the optimal clustering with k centers, in expectation and without conditions on the dataset. This result extends the previously known O(log k) guarantee for the case β = 1 to the constant-factor bi-criteria regime. It also improves upon an existing constant-factor bi-criteria result that holds only with constant probability.

artificial intelligence, lemma 3, machine learning, (17 more...)

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)